Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 3 de 3
Filter
Add filters

Database
Language
Document Type
Year range
1.
IEEE Journal on Selected Areas in Communications ; 41(1):107-118, 2023.
Article in English | Scopus | ID: covidwho-2245641

ABSTRACT

Video represents the majority of internet traffic today, driving a continual race between the generation of higher quality content, transmission of larger file sizes, and the development of network infrastructure. In addition, the recent COVID-19 pandemic fueled a surge in the use of video conferencing tools. Since videos take up considerable bandwidth ( ∼ 100 Kbps to a few Mbps), improved video compression can have a substantial impact on network performance for live and pre-recorded content, providing broader access to multimedia content worldwide. We present a novel video compression pipeline, called Txt2Vid, which dramatically reduces data transmission rates by compressing webcam videos ('talking-head videos') to a text transcript. The text is transmitted and decoded into a realistic reconstruction of the original video using recent advances in deep learning based voice cloning and lip syncing models. Our generative pipeline achieves two to three orders of magnitude reduction in the bitrate as compared to the standard audio-video codecs (encoders-decoders), while maintaining equivalent Quality-of-Experience based on a subjective evaluation by users ( n=242 ) in an online study. The Txt2Vid framework opens up the potential for creating novel applications such as enabling audio-video communication during poor internet connectivity, or in remote terrains with limited bandwidth. The code for this work is available at https://github.com/tpulkit/txt2vid.git. © 1983-2012 IEEE.

2.
IEEE Journal on Selected Areas in Communications ; : 1-1, 2022.
Article in English | Scopus | ID: covidwho-2152491

ABSTRACT

Video represents the majority of internet traffic today, driving a continual race between the generation of higher quality content, transmission of larger file sizes, and the development of network infrastructure. In addition, the recent COVID-19 pandemic fueled a surge in the use of video conferencing tools. Since videos take up considerable bandwidth (~100 Kbps to a few Mbps), improved video compression can have a substantial impact on network performance for live and pre-recorded content, providing broader access to multimedia content worldwide. We present a novel video compression pipeline, called Txt2Vid, which dramatically reduces data transmission rates by compressing webcam videos (“talking-head videos”) to a text transcript. The text is transmitted and decoded into a realistic reconstruction of the original video using recent advances in deep learning based voice cloning and lip syncing models. Our generative pipeline achieves two to three orders of magnitude reduction in the bitrate as compared to the standard audio-video codecs (encoders-decoders), while maintaining equivalent Quality-of-Experience based on a subjective evaluation by users (n = 242) in an online study. The Txt2Vid framework opens up the potential for creating novel applications such as enabling audio-video communication during poor internet connectivity, or in remote terrains with limited bandwidth. The code for this work is available at https://github.com/tpulkit/txt2vid.git. IEEE

3.
6th International Conference on Computational Intelligence in Data Mining, ICCIDM 2021 ; 281:137-148, 2022.
Article in English | Scopus | ID: covidwho-1872352

ABSTRACT

Covid-19 pandemic led to remote working and hence resulting in more video conferences among all sectors. Even important international conferences between different nations are being conducted on online video conferencing platforms. Hence, a methodology capable of performing real-time end-to-end speech translation has become a necessity. In this paper, we have proposed a complete pipeline methodology, wherein the real-time video conferencing will become interactive, and it can be used in the educational section for generating videos of instructors from just their images and textual notes. We are using automatic voice translation (AVT), text-to-stream machine translation (MT), and text-to-voice generator for voice cloning and translation in real time. For video generation, we use general adversarial networks (GANs), encoder-decoder, and various other previously implemented generative models. The proposed methodology has been implemented and tested with some raw data and is quite effective for the specified application. © 2022, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

SELECTION OF CITATIONS
SEARCH DETAIL